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1. INTRODUCTION 

Deception can be defined as the process of hiding the truth from others by utilizing face and body 
gestures [1]. Many persons try to deceive other people for many reasons. From psychological perspective, 
there are two types of deceptions, which are low-stakes (face saving) and high-stakes (malicious deception). 
Much research works and studies are conducted to detect the second type. Moreover, the person that tends to 
lie uses more cognitive load than an innocent one because deception requires to think and imagine before 
giving an answer for any question [2]. Recently, deception detection system (DDS) is widely used in 
different applications as security, criminal investigation, and terrorism detection [3]. 

Different studies are performed in this filed and each of them used either verbal or non-verbal cues 
to detect deception. A study performed by Amir et al. [4] by designing DDS based on measuring brain wave. 
These waves are detected and measured for 18 subjects. Another study performed on using the brain 
activities of DDS by Simbolon et al. [5]. This study was carried on 11 participants, moreover, they used 
support vector machine (SVM) classifier and the measured system accuracy is equal to 70.83%. A study 
performed by Noje and Malutan [6] on deception detection using head movements. This study was done on 
10 participants with detection accuracy of 58.25%. Thannoon et al. [7] designed a DDS based on using facial 
expressions, these expressions are encoded based on facial action coding system (FACS). This system 
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distinguished liar from innocent subjects based on detecting eight action units (AUs). The study was 
performed on 43 participants and the detection accuracy of the suggested method is 84%. 

This paper is aimed to design a DDS based on hybrid (combined) features extraction technique. The 
problem of the above related work is the detection accuracy and the low number of participants’ video clips. 
The novelty of the proposed system is increasing the detection accuracy and using much greater number of 
video clips. Furthermore, linear regression (LR) is used for the first time in testing the database collected in 
unconstrained environment with no limitations on facial expressions, head movements and eye gaze 
movements. 


2. AUTOMATED DECEPTION DETECTION SYSTEM 

The automated DDS consists of three stages, these stages are arranged as follows: data collection 
and pre-processing, features extraction and finally the classification stage. The first stage is the process of 
recording videos for the persons under test and perform pre-processing on the collected data. Then apply face 
detection and landmark detection process. In the second stage, features are used as indicators to reveal to the 
deception state. These features are then applied to the final stage to determine the class that they belong to. 
Figure 1 shows the general block diagram of the automated DDS. These three stages are explained below 
with more details. 


First stage Second stage Third stage 


Feature 
vector 
Features extraction Classifier 


Face image output 


Data collection and 
pre-processing 


Liar / truth-teller 


Figure 1. The general stages of the DDS [1] 


2.1. Data collection and pre-processing stage 

The first stage is related to collecting videos (data). These videos are for participants under test. 
After this step, it is necessary to determine the essential durations that contain important features for 
deception detection. The results after this step are called video clips. These clips are then applied to face 
detection algorithm in order to detect subject's face and distinguish it from non-face parts (background). The 
resulting face detected images are utilized by features points (landmarks) detection algorithm. The 
importance of this step is to place points on the regions of interest in the subject's face image. These regions 
are face border, nose, mouth, eyebrows, and eyes. 

One of the most accurate and well know face detection algorithms is the Viola-Jones (VJ) algorithm. 
It gained its popularity due to several reasons like fast, accurate detection, robustness, detect multiple faces in 
a single image and operate in real time face detection systems. For landmark detection process the 
constrained local neural fields (CLNF) is used [8]. The CLNF is considered as the most efficient and robust 
method for landmark detection. To Place features points in a 3D point distribution model (PDM), the (1) is 
applied, so each point is controlled by parameters [s, R, q, t] as given by the equation [9]. 


Xj = S. Rp. i + iq) +t (1) 


@; is the principal component matrix, Xi = [Xi, Ji, Zi]"is the mean value of the in feature. q represents m 
dimensional vector of parameters controlling the non-rigid shape. s scaling term that controls how close the 
face is to the camera. t represents translation term and Rəp is a 2x3 rotation matrix. 


2.2. Features extraction 

The second stage in DDS is features extraction. Three types of features are extracted, these are facial 
expressions, head movements and eye gaze. These features have a direct relationship with mental process, so 
they effectively reveal deception. 


2.2.1. Facial expressions 

The automatic system for facial expressions analysis and measurement have been widely adopted in 
different fields that are related to security, entertainment, clinic, and commercials. Facial features are 
described and analyzed based on a standard coding technique that is usually referred as facial action coding 
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system (FACS) [10]. FACS encodes each movment or motion related to a specific facial muscle in a form of 
action unit (AU). The detection of AUs depends on using two types of features these are: geometry and 
appearance [11]. Geometry based features are determined and measured based on both landmark point 
location and shape parameters. In appearance, features are extracted from utilizing histograms of oriented 
gradients (HOGs) [12]. 


2.2.2. Head movements 

Humans tend to use head movements as a sign when they communicate or interact with others [13]. 
There are different head actions like lowering, raising, and nodding. Each action is related to a specific 
meaning. For head tracking, CLNF method is used that depends on generalized adaptive view-based 
appearance model (GAVAM) for head pause tracking in varying illumination conditions. Tracking method 
operates on image sequence (video) and perform estimation of head translation and orientation in a form of 
three dimensions. Translation movements is described according to three translation axes, these are: x-axis, 
y-axis, and z-axis. These axes change when the subject's distance to the camera changes. In addition to these 
mentioned axes, there are additional three rotational-axes, these are: roll, pitch and yaw [14]. Figure 2 shows 
head movements according to the mentioned axes. 


Yaw 


z Pitch : . 
Pitch g R è 
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Figure 2. Possible head movements based on roll, pitch and yaw [14] 


2.2.3. Eye gaze 

The signal taken from human eyes is considered a source of rich information, this information is 
related to the mental process directly. The direction of eye gaze reflects the internal state, or the information 
stored in the brain. This information refers weather a person is imagining, lying, remembering, making internal 
dialogues or subjection sign [15]. Eye gaze detection process pass through two steps, the first step, is referred as 
eye-shape registration and the second step is called appearance-based gaze estimation [16]. The first step is 
applied to identify the shape of the eye region by placing landmark points around the eye region. CLNF is the 
used algorithm for locating and tracking landmark points. The second step is to determine appearance features 
for eye region. This feature is determined from pixels contained in the eye image directly [17]. 


2.3. Classifier 
When Features extraction process is complete, it becomes necessary to apply decision classifiers 

[18]. In the previous stage, three kinds of features are extracted, these are: facial expressions, head 
movements and eye gaze. These features are combined together and applied to the classification stage [19]. 
In this work logistic regression (LR) classifier is applied. Logistic regression (LR) is one of the most popular 
supervised learning algorithms. It is usually used in a binary classification problem (two class problem) [20]. 
In Logistic regression-based classification, a set of given arbitrary inputs, and then outputs are calculated by 
applying a function that represents classification output. For classification, there are two classes: class 0 or 
class 1. Based on the requirement for classification, it is necessary to limit the output range within 0-1. There 
are different functions but sigmoid function is the most popular and widely used with LR. Figure 3 shows the 
response of S shape function or logistic function. The following equations show how the response is 
computed with application of sigmoid function [21]. 


z=w'x (2) 

Z = WoXo + W1X1 + + WnXn (3) 
1 

y=f OS =z (4) 
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Figure 3. Sigmoid function or logistic function curve 


3. DATABASE COLLECTION 

The collected database contains videos for 102 participants, 25 of them are females and 77 are 
males. Their ages range from 18-55 years. Each participant during the interview period was asked a set of 
questions and require thinking before answering them. The videos for all participants are recorded under 
unconstrained environment. Figure 4 shows a sample image for a participant during the interview. The 
recorded videos for all participants are captured using a digital camera type Canon 2000 D. 


Figure 4. Sample images for participants during the interview 


4. THE PROPOSED DECEPTION DETECTION SYSTEM (DDS) 

The proposed DDS mainly consists of three stages which are arranged as follows: video recording and 
pre-processing, features extraction and classification. The first stage related to recording videos for volunteers 
then perform editing step to perform face and landmark detection, Figure 5 shows the details of this step. 
Extracted features from collected videos are applied for classification method to determine liar from 
innocent. Figure 6 shows the features extraction and the classification stage. 
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Figure 5. Video recording and pre-processing in DDS 
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Figure 6. Features extraction and classification stages for DDS 


4.1. Video recording and pre-processing 

After videos are recorded for participants, it is necessary to perform video editing. Editing means 
determining the necessary parts (frames) in the captured video. This step results to a set of video clips. An 
important note that must be clarified is that the edited video clips represent the duration when participants are 
thinking, moreover, the resulting video clips are equal to 888 (384 for truth and 504 for lie). The next step is 
performing face detection. This work is based on using VJ algorithm. After Applying face detection 
algorithm, the output face image is used for initializing landmarks points. The CLNF algorithm is used for 
locating 68 points on the detected image as shown in Figure 7. 


Cascade AdaBoost Landmarks 


Original image Face image 


Frame detection 


Figure 7. The application of face and landmarks detection algorithm 


4.2. Dynamic feature extraction 

There are three kinds of features to be extracted, these are: facial expressions, head movements, and 
eye gaze. Features extraction is a very important stage in DDSs to distinguish the case of truth or lie. There 
are many so many features that can be used for DDS, the most effective of which are discussed in details in 
the following sub-sections. 


4.2.1. Action unit (AU) detection 

AUs detection process is required to capture two kinds of features, these are: geometry and appearance 
features. Geometry features basically depend on capturing both; feature (landmark) point location and non-rigid 
shape parameters. For extracting appearance features, it is necessary to remove any non-facial parts from the given 
image then extract appearance features. Figure 8 shows the essential steps in the process detection of AUs. 
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Figure 8. Facial AUs detection based on determining both appearance and geometry features 


For each participant, eighteen AUs are extracted. It is worth knowing that not all AUs have the same 
effect on the designed DDS. Based on the collected dataset, one can show that a set of AUs do not have a 
direct impact in the process of discriminating liar from truth-teller, so it is necessary to determine the 
effective set of AUs. Table 1 shows the effective AUs with the associated facial muscles in the human face. 


Table 1. The effective AUs in the designed DDS 
Action Unit (AU) Name based on FACS Associated facial region 


AU6 Cheek Raiser 
AU7 Lid Tightener 
AU10 Upper Lip Raiser 
AU12 Lip Corner Puller 
AU14 Dimpler 
AU28 Lip Suck 


4.2.2. Head movements detection 

Head movement detection is used to describe head transitions and orientation (rotation). For 
transition representation, head location is represented in three dimensional axes these are x, y and z. For 
rotation, head movements are described based on Euler's angle that consists of three axes these are pitch, yaw 
and roll. These six features (x-axis, y-axis, z-axis, pitch, yaw, and roll) that fully describe head movements 
are extracted. In Psychology, when participants move their heads in a specific direction it means that the 
participant is deceiving the interviewer. If there’s no movements, it means that the subject is telling the truth. 


4.2.2.1. The proposed head pose features (rotation) 
For pitch feature, the variance function is applied, and the output must be greater than 0.0004 
(discriminated threshold). Figure 9 shows the application of variance function on pitch feature. Variance 


function is used to measure change or spread of data from the mean, and it is simply calculated using (5). 


n cm AZ 
variance (x) = o? = Me (5) 


The x symbol represents input data while u represents mean value and n is data size. 
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Figure 9. Variance value for each clip with the discriminating threshold 


For yaw feature, the variance function applied on calculated yaw value. The variance value must be 
greater than 0.0001 to ensure sufficient distinguishing between lie and truth states. Figure 10 shows the 
application of variance function on yaw feature. For roll feature, the variance function is also applied on roll 
feature, the variance value should be greater than 0.0001 for providing clear discrimination for lie response from 
truth response. Figure 11 shows the variance of roll feature for each video clip in both lie and truth state. 
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Figure 10. Variance value for each clip with the discriminating threshold 
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Figure 11. Variance value for each clip with the discriminating threshold 
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2.2.2.2. The proposed head pose features (translation) 

Participant's head is modeled as three-dimensional axes, these axes are x, y, and z. They are measured 
in millimeters (mm). Moreover, the z value increases when participant's head becomes away from the camera 
and decrease when it becomes closer to the camera. The x-axis feature is computed for each frame then 
accompanied function is applied on this feature. The first thing is to apply difference function that simply 
calculates the difference between frames. Then, apply second function on the resulting data, this function 
represents the sign function. Sign function simply converts negative values to -1 and positive values to +1 while 
zero values are kept zero. Then applying the third function which represents the difference, and it is applied in 
the same manner. The fourth and final step is related to finding the elements' location with nonzero values 
which mean positive and negative values for the purpose of calculating the length of zero (0) values. The feature 
value must be greater than 11 to discriminate between lie and truth state. Figure 12 shows the value of x-axis 
after applying a combined function set. For measuring y-axis features and discriminate between its values for 
both lie and truth video clips, the logical OR operation is performed. The first condition based on using two 
functions, first, calculating difference function between calculated y-axis features for each video clip then the 
result of difference function is applied to mean function that simply calculates the mean value for the resulting 
data. The (6) shows the mathematical expression of mean function. The result must be greater than 0.3, this 
value represents the discriminating threshold between lie and truth video clips. 


_ ii 
p = ŽE (6) 
The second condition uses the same functions as the first condition, but the result must be less than -0.3. So, 
the difference between the two conditions is the value of the specified discriminating threshold. Figure 13 
shows the extracted y-axis feature with the determined threshold. 
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Figure 12. x-axis value for each clip with the discriminating threshold 
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Figure 13. y-axis value for each clip with the discriminating threshold 
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The z-axis feature is determined for each frame within each video clip. The variance function is 
applied on this feature to determine the change (spread) from the mean. The output of variance function must 
be greater than 18 and this value is the discriminating threshold between lie and truth video clips. Figure 14 
shows the application of variance feature on z-axis. 
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Figure 14. z-axis value for each clip with the discriminating threshold 


4.2.3. Eye gaze detection 

Eye gaze detection or eye gaze estimation is referred as the process of identifying gaze direction 
(where a participant is looking at). There are two features that can be extracted from participant's eyes. First is 
the eye gaze angle in x direction, this direction relates to moving eye gaze from left right. The second feature is 
related to eye gaze in y direction (up-down eye movment). In Psychology, when participant looking to the right, 
he/she tries to imagine something that has not occurred before, imagining also mean they deceive others. 


4.2.3.1. Eye gaze angle in x direction 

To discriminate eye gaze directional angle in x-axis for lie and truth state, two types of features are 
extracted. The first feature is based on calculating variance function on eye gaze in x direction. The variance 
function output must be greater than 0.0004 to distinguish lie from truth state. Figure 15 shows this feature 
with the defined discriminate threshold. 
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Figure 15. Variance of gaze angle in x direction with the discriminating threshold 


The second extracted feature for eye gaze in the x direction is that extracted by applying Sign 
function. The (7) shows the expression of sign function. The output from the sign function is applied to a sum 
operation. The combination of both sign functions with the summation should be less than -21 to provide 
enough distinguishing between liar and truthful response, as shown in Figure 16. 
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Figure 16. Sum of sgn of gaze angle in x direction with the threshold 


4.2.3.2. Eye gaze angle in y direction 

Eye gaze angle in y direction refers to the gaze movement upwards or downwards and it is measured 
in degrees. To discriminate between lie and truth video clips, the variance function is applied. The variance 
function output must be greater than 0.0015, where 0.0015 represents the discriminating threshold. Figure 17 
shows the variance value for each clip with the determined threshold value. 
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Figure 17. Variance of gaze angle in y direction with the specified threshold 


5. METHOD 


The novelity of this work is proposing a DDS based on logistic regression (LR) classifier which has 
not been used in similar work. The classification stage is the final stage in the designed DDS. The importance 
of this stage is to distinguish between input extracted features and determine where each belongs to, either 
liar or truth-teller. The collected database contains 888 clips, 444 clips used for training the LR while the 
remaining clips are used for testing. 
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LR classifier is one of the most popular supervised machine learning algorithms. It is used as an 
essential classifier for classification of binary aspects. The popularity of logistic classifiers is related to 
different reasons like easy to train, relatively low computational complexity, easy to classify any new entry 
and good accuracy for large datasets. It draws a linear decision boundary between classes. The LR classifier 
performs a simple step to compute the output. First it computes the response by multiplying the input features 
vector by the weight. Then apply the result to the activation function to take the decision (d) to determine the 
output. The decision-making processes work is based on (8). 


Lif output = threshold 


any te if output < threshold 


(8) 


For each new entry, the LR performs the above computations and finally compare the output with 
the threshold to make decision and determine the class for the given input. The threshold value equals to 0.5. 
So, it is clear that the output from LR classifier is limited to one of two values; either | if the output is greater 
than or equals to 0.5 which represent the first class (liar) or 0 if the output is less than the threshold value 
(0.5) which represent the second class (truth-teller). 


6. RESULTS AND DISCUSSION 

The performance metrics of the proposed deception detection system based on logistic regression 
(LR) classifier is examined. The LR classifier is tested on 444 samples that were selected randomly. Table 2 
shows the detection accuracy of LR classifier. From table, there are 223 samples from lie response that are 
classified correctly and placed in first class (liar). In addition of 168 samples from truth-teller response are 
classified correctly and labeled to the truth (second class). 

So, the overall number of correctly classified sample is equal to 391 samples. There are 29 samples 
from lie response that are labeled to the wrong class and classified as truth, moreover there are 24 samples 
form truth response are classified as belonging to liar class. This error in the classification process occured 
due to the overlap in some of extracted features. The final detection accuracy of LR classifier measured based 
on the (9) [22]: 


total number of samples that classified correctly 


accuracy = x 100% (9) 


total number of samples that used for testing phase 


Finally, the detection accuracy of the suggested DDS based on applying the LR classifier is equal 
to 88.0631%. 


Table 2. The detection accuracy of the suggested DDS based on using the LR classifier 


Classifier output 


The input response 


Lie Truth 
Lie 223 29 
Truth 24 168 
Detection accuracy 88.0631% 


After explaining the details of the suggested system, it is necessary to make a comparison between 
the suggested system with previous research works. However, this comparison is unfair, because of two 
reasons. First, the recorded videos for 102 participants are collected in unconstrained (naturalistic) 
environment which means there is no constraint or limitation on the camera distance and lighting condition, 
while the previous work use any publicly available database available on YouTube or on any web site. 
Second, the proposed system uses hybrid technique in which three features are extracted and used for 
discriminating liar from truth-teller while the previous studies use either single or double kind of features. 
These two reasons make a high difference between the proposed system and the previously suggested works. 
The previous studies used different cues (features) for detecting deception, these features like facial 
expressions, facial micro-expressions, brain activity, temperature change, head movements and speech. 
Table 3 shows the comparison in terms of number of participants, type of features and detection accuracy. It 
is clear from table that the suggested system achieves highest detection accuracy in addition of the collected 
database contain greater number of subjects. 
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Table 3. Comparison between the proposed DDS and the previous studies 

Research Year No. of participants Features Details Detection Accuracy 
Amir et al. [4] 2013 18 Brain activity --- 
Azar and Campisi [23] 2014 11 Temperature Change 84% 
Simbolon et al. [5] 2015 11 Brain Activities 70.83% 
Noje and Malutan [6] 2015 10 Head movement 58.25% 
Bedoya-Echeverry et al. [24] 2017 27 Thermal imaging 79.2 % 
Azhan et al. [25] 2018 38 micro-expressions 76.2%. 
Thannoon et al. [7] 2019 43 Facial expression 84% 
Proposed DDS 2021 102 Facial expressions, head movements and eye gaze 88.063 1% 
7. CONCLUSION 


The proposed DDS based on a hybrid technique for features extraction is designed and tested, in 


which three kinds of features are extracted. These features are facial expressions, head movements and eye 
gaze. For facial expressions that encoded based on FACS, the optimization step is performed in order to 
select only six effective AUs instead of eighteen. The resulting features equal to fifteen and they are arranged 
in a single vector in order to be applied to the LR classifier which is used for the first time in such work. The 
use of 888 video clips in this work supported our aim to increased detection accuracy. The final detection 
accuracy of the designed DDS based on using the mentioned classification algorithm is equal to 88.0631%. 
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